-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adding RAG_FAQ.MD to retrieval-augmented-generation #655
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since most of these are related to the intro_multimodal_rag notebook, can you add these as a section at the end of that notebook?
@lavinigam-gcp What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasons behind keeping it as a standalone doc:
- the doc will grow as more questions are brought up. *.MD files don't need to go through as rigorous a pull/merge process as *.ipynb, so the maintenance would be easier.
- from a customer's standpoint, it might be more useful to refer to one RAG QnA doc rather than look at every notebook in the directory.
That being said, it is your repo, happy to "disagree and commit". it is more important to get the content in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your reasoning. I'm thinking that the questions specific to the one notebook should be in the notebook itself, or maybe an FAQ.md in that specific directory, or just more specific explanations in the main content of the notebook.
The rest of the general RAG FAQ should really be turned into official documentation.
Most questions apply to | ||
https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/intro_multimodal_rag.ipynb | ||
|
||
Q: Why do we need both variables: text_embedding_page (the embedding of the entire page) and text_embedding_chunk (the embedding of each text chunk)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use code font when referring to variables/methods
Q: Why do we need both variables: text_embedding_page (the embedding of the entire page) and text_embedding_chunk (the embedding of each text chunk)? | ||
A: We use chunk for most cases in the downstream task in the notebook. This is again from a demonstration purposes that you can either take the whole page or divide them further into smaller chunk. A lot depends on 1) LLM token limit, 2) How would you want your data to be structured and searched. Like, Gemin 1.0 has 8k token limit, so we can very easily do a single page (and multiple as well), and same with Gemini 1.5, where you can probably send a whole document at go. One of the rationale for chunk is to make sure, we don't have too much noise while search for something specific. Hope that clarifies? | ||
|
||
Q. The accompanying presentation (and https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro-textemb-vectorsearch.ipynb talks about ANN method of finding closest embeddings match, but the code uses cosine similarity… Should we even mention ANN for this notebook? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turn that notebook link into a hyperlink.
A” Yes, In the notebook, we use simple "cosine similarity", but in real world at big scale, you would want something like managed ANN, as given in the presentation. | ||
|
||
Q. The cell which calls get_gemini_response() generates different responses if I run it several times, even with temperature=0 and same 'query' and 'context' variable. Is this expected? Can anything be done to make it generate the same response every time? | ||
A. Yeah, that behavior is expected. Temp=0 doesn't solve it. I am not sure if there is any plan in the roadmap to address it. You can however make prompts do that magic by forcing structure like JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this more formal, instead of "Yeah" put "Yes" also fully spell out the variable name. See comment above about code formatting.
A. The pdf is split just to show that you can read multiple pdf’s with the logic. That’s the only reason. | ||
|
||
Q. For text embeddings, we are splitting the docs into smaller "chunks". Are we doing the same for images? | ||
A. No we are not chuncking images. We have two kinds of image embeddings - one uses ‘multimodal-embeddings’ where we send image directly and the API returns embeddings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A. No we are not chuncking images. We have two kinds of image embeddings - one uses ‘multimodal-embeddings’ where we send image directly and the API returns embeddings. | |
A. No we are not chunking images. We have two kinds of image embeddings - one uses `multimodal-embeddings` where we send the image directly and the API returns embeddings. |
|
||
Q. For text embeddings, we are splitting the docs into smaller "chunks". Are we doing the same for images? | ||
A. No we are not chuncking images. We have two kinds of image embeddings - one uses ‘multimodal-embeddings’ where we send image directly and the API returns embeddings. | ||
The second method is - we send an image to Gemini - get a description of it as text - and send that text to the ‘text-embeddings’ model and get embeddings back. In the latter case, we send the whole text as is. No chunking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turn into a bullet point list.
A. No we are not chuncking images. We have two kinds of image embeddings - one uses ‘multimodal-embeddings’ where we send image directly and the API returns embeddings. | ||
The second method is - we send an image to Gemini - get a description of it as text - and send that text to the ‘text-embeddings’ model and get embeddings back. In the latter case, we send the whole text as is. No chunking. | ||
|
||
Q: Is it a good practice to do a RAG on the LLM-generated summary of text chunks rather than the raw text? Context: for customer-service or “trouble-ticket” use-cases, the docs in the data corpus aren’t always grammatically correct or well-structured syntactically. Would document pre-processing/summarization prior to generating the embeddings lead to a higher accuracy of a RAG-based solution? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change quote marks to straight quotes "
instead of curly quotes “”
Q: Is it a good practice to do a RAG on the LLM-generated summary of text chunks rather than the raw text? Context: for customer-service or “trouble-ticket” use-cases, the docs in the data corpus aren’t always grammatically correct or well-structured syntactically. Would document pre-processing/summarization prior to generating the embeddings lead to a higher accuracy of a RAG-based solution? | |
Q: Is it a good practice to do a RAG on the LLM-generated summary of text chunks rather than the raw text? Context: for customer-service or "trouble-ticket" use-cases, the docs in the data corpus aren’t always grammatically correct or well-structured syntactically. Would document pre-processing/summarization prior to generating the embeddings lead to a higher accuracy of a RAG-based solution? |
Co-authored-by: Holt Skinner <13262395+holtskinner@users.noreply.github.com>
…ive-ai into pr/sermolin/655
I added the spell checker, fix the identified errors |
@check-spelling-bot Report🔴 Please reviewSee the 📂 files view, the 📜action log, or 📝 job summary for details.Unrecognized words (3)chuncking |
Closing this PR due to inactivity. Please re-open if changes are made. |
Description
Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
CONTRIBUTING
Guide.CODEOWNERS
for the file(s).nox -s format
from the repository root to format).Fixes #<issue_number_goes_here> 🦕